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Abstract — Using a large deviations approach we calculate the 
probability distribution of the mutual information of MIMO 
channels in the limit of large antenna numbers. In contrast to 
previous methods that only focused at the distribution close to its 
mean (thus obtaining an asymptotically Gaussian distribution), 
we calculate the full distribution, including its tails which 
strongly deviate from the Gaussian behavior near the mean. 
The resulting distribution interpolates seamlessly between the 
Gaussian approximation for rates R close to the ergodic value 
of the mutual information and the approach of Zheng and Tse 
[1] for large signal to noise ratios p. This calculation provides 
us with a tool to obtain outage probabilities analytically at any 
point in the (R,p,N) parameter space, as long as the number 
of antennas A' is not too small. In addition, this method also 
yields the probability distribution of eigenvalues constrained in 
the subspace where the mutual information per antenna is fixed 
to R for a given p. Quite remarkably, this eigenvalue density is of 
the form of the Marcenko-Pastur distribution with square-root 
singularities, and it depends on the values of R and p. 



I. Introduction 

Considerable interest has arisen from the prediction [2], [3] 
that the use of multiple antennas in transmitting and receiving 
signals can lead to substantial gains in information throughput. 
To analyze the theoretical limits of such a MIMO (Multiple 
Input Multiple Output) system, it has been convenient to focus 
on the case of i.i.d. Gaussian noise and input. For the MIMO 
channel model 

y = Hx + z (l) 

with coherent detection and no channel state information at 
the transmitter [2], [3], the mutual information 1^ for a given 
value of the channel matrix H takes the familiar form: 



= logdet(l+pH 1 "H). 



(2) 



where "log" signifies the natural logarithm, p is the signal 
to noise ratio and H is the M x N channel matrix whose 
elements are independent CN(0, 1 /N) random variables. This 
corresponds to the case of N transmitting and M receiving 
antennas and we can define the ratio as fi — M/N. Without 
loss of generality we assume that f3 = M/N > 1; otherwise, 
if jS < 1, we may simply replace p with p new = p/3 in (fJJ and 
interchange the roles of M and N. 

If the channel matrix H varies in time according to a 
stationary and ergodic process, and coding spans an arbitrarily 
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large number of fading states, then the "ergodic" channel 
capacity is given by the mutual information expected value 
E(/jv) [3]. In that case, the gains of the use of multiple 
antennas have been analytically quantified by assuming that 
the number of antennas N is large while fi remains fixed 
and finite; accordingly, H can be viewed as a large random 
matrix. Then, by applying ideas and methods from the theory 
of random matrices, it was shown in [5] that the value of 
the mutual information per antenna I(p,H)/N "freezes" to a 
deterministic value in the large N limit, the so-called ergodic 
average r ag (p). Underlying this result is the fact that the 
very eigenvalue distribution of H' H freezes to the celebrated 
Marcenko-Pastur distribution: 



p(x) 



y(b — x)(x — a) 
2nx 



(3) 



where a,b — ( V/5 + l) 2 are the end-points of its support. 

However, another interesting regime for this channel is 
represented by the case where the channel matrix is random, 
but it varies in time much more slowly than the typical coding 
delay. In this case, usually referred to as the "quasi-static" 
fading channel, H can be considered as a random constant 
and the mutual information /# is a random variable. In this 
regime, the relevant performance metric is the "rate versus 
outage probability" tradeoff [4], captured by the cumulative 
distribution function of Various approaches [6]-[8] have 
shown that the outage probability P 0Ut (R) := P(/jv ^ R) 
becomes asymptotically Gaussian for large N, with mean equal 
to the ergodic capacity R erg = Nr erg {p) and a variance of order 
0(1) in N. This Gaussian variability of the mutual information 
is due to the fluctuations of the eigenvalues of the matrix 
away from the most probable distribution described by the 
Marcenko-Pastur law. Since this Gaussian approximation is 
essentially a variation of the central limit theorem, it only 
applies within a small number of standard deviations away 
from R as . As a result, this approximation fails to capture the 
tails of the distribution, e.g. the probability of the rate R falling 
to half its ergodic value R a - g /2, because this event only occurs 
0(AO standard deviations away from the mean. 

Hence, the tails of the distributions of the mutual informa- 
tion are important, because they correspond to regions with 
low outage probability, where one would want to operate 
a MIMO system. The interplay between low outage and 
multiplexing gain was exemplified in the seminal paper [1] 
where the authors analyzed the asymptotics of the distribution 
of the mutual information in the limit of large p (keeping 
R/logp fixed). They found that the asymptotic form of the 
logarithm of the distribution of the mutual information is a 



2 



piecewise linear function of R/logp, interpolating between 
the discrete set of values: 



log P om (R„) 



(R„ -Mlogp){R n - Nlogp) 
logp 



(4) 



where the intermediate rates are given by R„ = n log p for 
integer n < N < M0 When, in addition to p, N is also large, 
log P ut(R) in © becomes (to leading order) a continuous 
function of R/N. It should be pointed out that this result is 
complementary to the large N asymptotics discussed above, 
since it provides insight in the distribution of the mutual 
information quite far from its peak which, for large p (and 
large N) is situated at ~ Nlogp. However, this expression 
does not give any quantitative estimates of the outage for finite 
p and therefore cannot be used for realistic outage predictions. 

Generalized Marchenko-Pastur Distributions, |3 = 2, p = 10 
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Fig. 1. Generalized MP distributions for SNR=10dB and different values of 
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Fig. 2. Comparison of MP distribution and distribution resulting from shifting 
the rate by two standard deviations from the most probable one r cre . The 
agreement is quite remarkable with the numerical distribution. 

In the meantime, all variants of the large N Gaussian 
approximation of the mutual information [6]-[8] fail spec- 
tacularly for large p, even near the peak of the distribution. 

'it should be noted that [1] analyzed the outage CDF Fj N (R) = ¥(I N < R) 
rather than the probability density P(R) = FJ^(S). However for large p and 
R„ < N\ogp both expressions are identical up to 0(1) corrections. 



Specifically, they all predict that the outage probability is given 
asymptotically by: 



log P out (R) ~- 



(R- Nlogp) 2 
2 log (!-/?-') 



(5) 



where ft = M/N > 1, an expression which is in striking 
disagreement to (|4j. Even though for /3 = 1 the asymptotic 
form of <j4j is recovered within the Gaussian approximation 
[6], [8], the discrepancy for {S + 1 indicates that the limits 
N — » oo and p — » oo cannot be naively interchanged. When 
one takes the large N limit first, one focuses on the most 
probable eigenvalue distribution, which converges vaguely to 
the Marcenko-Pastur distribution 0. However, as can be seen 
in ©, this distribution (almost surely) produces no eigenvalues 
close to zero when j3 > 1 . Nevertheless, the analysis for large 
p focuses at the regime where the eigenvalues are of order 
z = p~ l . As a result, it is not surprising that the large-Af 
Gaussian approximation of the mutual information distribution 
misses the correct behavior. 

In summary, we have two methods, the large-Af, fixed-p 
Gaussian approximation on the one hand and the large-p, 
fixed-Af limit on the other, both having their own regions of 
validity, and both failing to produce quantitative results for 
the outage probability outside their respective regions. Thus, 
one still needs an approach that correctly describes the outage 
behavior of the mutual information distribution for arbitrary p 
and R. 

In this paper, we introduce a large deviations approach 
to calculate the full asymptotic distribution of R. It is for- 
mally valid for large N, but works over the whole range 
of values of R and p. This method bridges the two regions 
of small/intermediate and large signal to noise ratios within 
a single framework and, in effect, it amounts to calculating 
the rate function of the logarithm of the average moment 
generating function of the mutual information. Our approach 
was first introduced in the context of random matrix theory 
by Dyson [9] and has been more recently discussed in [10]. 
It is quite intuitive because it interprets the eigenvalues of 
H'H as point charges repelling each other logarithmically. As 
a byproduct of this approach, we obtain the most probable 
eigenvalue distribution constrained over the subspace of fixed 
total rate R and signal to noise ratio p. This is a generalized 
Marcenko-Pastur distribution that gives the constrained eigen- 
value distribution for values of R far from its ergodic value. It 
is worth pointing out that many of the results presented here 
could be set on a more formal mathematical footing using tools 
developed in [11]. However, we will follow the less formal but 
more intuitive approach developed by Dyson. 

This generalized Marcenko-Pastur distribution can also be 
seen as the inverse of the so-called Shannon transform [12] 
in the following sense: while the Shannon transform pro- 
duces the value of normalized mutual information In IN as 
a functional of the asymptotic eigenvalue distribution of H' H 
(the Marcenko-Pastur distribution), the generalized Marcenko- 
Pastur distribution introduced here boils down to the asymp- 
totic eigenvalue distribution of H' H for a given value of the 
mutual information R = Nr, i.e., when H'H is constrained on 
the subspace defined by r — In IN. 
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A. Outline 



this distribution takes the well-known form: 



In the next section we will introduce the necessary mathe- 
matical methodology. In particular, Section Hl-AI describes the 
mapping of the joint probability distribution of eigenvalues 
of the Wishart matrix to a Coulomb gas of charges with a 
continuous density (discussed in more detail in Appendix IB1 
and the large-deviations analysis of the problem. Next, section 
III-BI deals with the solution of the resulting integral equation 
that produces the most-likely eigenvalue distribution at the tails 
of the full distribution. 

If one is not particularly interested in the details of our 
derivation, section [TT] can be skipped in favour of section [III] 
wherein we present our main results. Specifically, in section 
IIII-AI we rederive the Marcenko-Pastur distribution (that is, 
the most likely distribution without any mutual information 
constraint) to highlight the efficacy of our method. Subse- 
quently, sections IIII-BI and IIII-CI contain our results for the 
cases fi > 1 and [5-1 respectively, while in section IIII-DI we 
show how to calculate the outage probability directly by means 
of the results of the previous sections. Finally, in section [IV] 
we obtain previous results as limiting cases of this method, 
and also examine a number of different limiting cases. 

The proofs of the properties of tame distributions that we 
introduce in section III-AI are given in appendix |A] and in 
appendix |B] we discuss Dyson's original construction of the 
Coulomb gas model. Finally, in appendices |C] and |D] have 
been reserved for the exposition of some technical issues that 
cropped up during our calculations. 

II. Methodology 

Our approach can roughly be divided in two main parts. 
First, in section III-AI we reduce the original problem of 
finding the probability distribution of the mutual information 
to harvesting the minimum energy of a gas of charged particles 
(among other things we show here that the minimum energy 
configuration is unique). Then, in section IH-Bl we will solve 
the integral equation that comes up and actualy obtain the 
minimum energy configuration of the charges. 

Readers who are not interested in the mathematical intrica- 
cies of our work may skip this section altogether and proceed 
to our results in section [TTTJ 



A. Mapping the Problem to a Coulomb Gas 

We begin by establishing our mathematical methodology, 
treading on the elegant footsteps of [13], [14]. Our overall 
aim will be to calculate the probability distribution of the 
mutual information (0, which can be written in terms of the 
eigenvalues A^ of the Wishart matrix H'H as: 



A' 



^]log(l +pA k ) 



(6) 



k=\ 



Note that the aforementioned probability distribution of 
the mutual information thus depends on the joint probability 
distribution of the eigenvalues A\ . . .An of H'H. In its turn, 



P^A X ...A N ) = A N A(Ai . . . A N ) 2 J~~[ Af~ N e~ NXk (7) 



= A N e 



-N 2 E(A) 



(8) 



where An is a normalization constant and A(Ai.../ly) = 
n&yC^i _ Aj) is the Vandermonde determinant of the eigenval- 
ues At- As for the exponent E(A), this is an energy functional 
that will become very useful later on: 



E(A) = jj X ^ - 08 - 1} log Ak) + w X log \ A > - 

k j>k 



h\ (9) 



Note that the normalization we have chosen is such that E(A) 
corresponds roughly to the energy per eigenvalue. 

The cumulative probability distribution (CDF) of the nor- 
malized mutual information In IN can then be written as a 
ratio of two volumes in /1-space: 



F N (r) = ¥(I N IN <r) = 



V r JPA(A)&(r-I N IN)dA 



(10) 



Vtot J P\(A)dA 

where In is given by (O, &(x) is the Heaviside step function 
(0(jc) = 1 x > and ®(x) = if x < 0) and the integrals are 
taken w.r.t. ordinary A^-dimensional Lebesgue measure dA = 
YlidAi. The above CDF is by definition the outage probability 
for the normalized mutual information to fall below r and its 
corresponding probability density can be obtained from ( [TT] ) 
by taking the derivative with respect to r [15]: 



P N (r) = F'(r) 



JP A (A)S(r-I N /N)dA 
fPAU)dA 



(ID 



where we have used the fact that the derivative of the step 
function is the Dirac ^-function: 0'(x) = S(x). 

Our primary goal will be to use ( ITTb in order to obtain 
an analytic expression for the probability distribution function 
of the mutual information In- However, even though V tot is 
the Laguerre-Selberg integral which is known in closed form 
[16], in general there is no standard way to evaluate integrals 
like V r (except for some special cases [17]). Nevertheless, in 
the large A^-limit it is possible to analyze these integrals in a 
systematic way. This so-called Coulomb-gas approach [16] is 
based on the intuitive idea to interpret the eigenvalues A as the 
positions of positive unit charges located on a line, a picture 
first proposed by Dyson [9]. Within this interpretation, the last 
term in the exponent of (0 corresponds to the logarithmic 
repulsion energy, while the first term is the potential due to a 
constant field and the second term is the repulsion of a point 
charge located at the origin@ Using standard large-deviations 
ideas, it is then easy to see that for large the configuration 
of eigenvalues that minimizes the energy E(A) and which is 
consistent with the added constraints will become increasingly 
more probable and end up determining the distribution PN(f) 
itself. 

Now, it is instructive to look at the form of E(A) to get an 
intuitive understanding of the minimum energy configuration 

2 Note that these are simply the potentials that one obtains in classical two- 
dimensional electrostatics. 
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of A in the absence of the constraint In IN = r. As discussed 
above, the first two terms in E(A) correspond to the external 
forces acting on the charges, while the last term represents 
the repulsion between charges. In the absence of the charge 
repulsion the minimum energy configuration will correspond 
to all charges settling at the minimum of the external potential, 
i.e. A/t = f3 - 1 for all k = 1, . . . , N. Of course, the repulsion 
between charges will make the charges move away from that 
point but still, from simple electrostatics considerations, it is 
not expected that this repulsion will carry charges too far away 
from the minimum. As a result, we expect that at the minimum 
of E(A) all charges will be concentrated in the neighborhood 
of /3 - 1. As the number of charges increases, it will make 
sense, at least for configurations with energy E(A) close to 
the minimum, to assume that the charge distribution will be 
approximately a continuous distribution. As a result, all sums 
over A in E(A) may be replaced by integrals. We expect that 
this assumption will hold also in the presence of constraints 
as in dTTT >. 

It is therefore more convenient to focus only on continu- 
ous distributions, whose mathematical analysis is much more 
tractable. We will thus allow ourselves to be guided by our 
intuition and introduce: 

Conjecture 1 (Coulomb Gas). In the large N limit the charge 
distribution P# converges weakly to an absolutely continuous 
density p(x), x > OIn other words, as N — > oo the total charge 
in any interval / cR will be given by: 



am- 1 



p(x) dx. 



(12) 



This assumption is essentially identical to the one in Mehta's 
book [16] and has been extensively employed in the literature 
[9], [10], [13]. Unfortunately, despite its simple and intuitive 
nature, this assumption has resisted most attempts at a rigorous 
proof, thereby giving birth to different approaches, such as 
the one in [11]. Nevertheless, the results obtained there are 
in agreement with the ones obtained with the help of the 
Coulomb Gas assumption and, hence, we feel that our posit 
here is rather mild (see also appendix [B] for a more detailed 
discussion). 

At any rate, in light of the above, we may now write 
the energy function E[A] as a functional over the continuous 
charge densities p: 



£[p] 



J"xp(x) dx - (J3 - 1) Jp(x)logxdx (13) 

- JJ p(x)p(y)\og\x-y\dxdy 

However, to make proper use of the energy functional £ ( fT3l 
we must first make sure that it remains finite over a reasonably 
large class of densities p(x). This leads us to the concept of 
"tameness": 



Definition 2. An integrable function p : 
called s-tame when: 
(i) the "absolute mean" of p is finite: 

x|p(x)| dx < oo; 



f 

Jo 



will be 



(14) 



(ii) there exists some s > such that p is L 1+£ -integrable, i.e. 

(15) 



Jo 



\ P (X)\ 1+S dx < OO. 



Remark 2.1. The phrasing of condition (i) simply reflects 
our interest in tame functions p = px that are probability 
densities of random variables X with values in K.+ . In that 
case, condition (i) simply states that X has finite mean: 



E(Z) 



f 

Jo 



xp(x) dx < oo. 



Remark 2.2. Condition (ii) will be crucial to our analysis. At 
first, it might appear as a mere technical necessity (see e.g. 
section III-BI and appendix but, in fact, it has a very deep 
physical interpretation: a probability density with finite mean 
might fail to have finite energy, making it inadmissible on 
physical grounds (see lemma [3] below). 

Remark 2.3. When it is not necessary to make explicit mention 
of the exponent e, we will simply say that p is tame. Similarly, 
an absolutely continuous (signed) measure cr on R + will be 
called tame when its Lebesgue derivative p(x) = ^j?- is 
tame. Given this equivalence, we will use the two terms 
interchangeably. 

Going back to the energy functional £ of dT3b . we can see 
that condition (i) guarantees that the first term in ( PT31 is finite, 
while (ii) bounds the second and third terms. This is captured 
in the following: 

Lemma 3 (Finiteness and Continuity of £). Let Q be the 

space of tame functions on R + and let £ be defined as in ( 1731 ). 
Then, £[/?] < oo for all p € Q and the restriction of £ to 
any subspace of L l+E -integrable functions with finite mean is 
continuous (in the L 1+e norm). In other words, tame densities 
have finite energy and tame variations in density induce small 
variations in energy. 

We prove this lemma in appendix [A] where we also give 
some background information on the U norms. For now, it 
will be more useful to express the probability density fV(r) 
as the ratio: 

P N {f) = (16) 

where, in accordance with ([SJ, ( fTTT i and ( fT31 ). Z r and Z are the 

partition functions^ 



Jx, 
Jx 



-N 2 £[p] 



-N 2 £[p] 



(17) 



(18) 



and Dp denotes the path-integral measure over the domains 
of tame densities X, X r C Q: 



X = |p € Q : p > and J p(x) dx - 1 
pel: j"p(x) log(l + px) dx - r 



(19) 



(20) 



3 It is worth pointing out that the correction to the term N 2 H[p] in the 
exponent is 0(1) (see appendix [E] for more details). Also a nice analysis of 
the mapping from the X integrals to path integrals over p can also be found 
in [14]. 
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Of course, from a mathematical point of view, constructing 
a measure Dp over the infinite-dimensional space of functions 
is an intricate process which is far from trivial. One way to 
go about this is similar to the construction of the Wiener 
measure for Brownian motion: indeed, Q can be considered 
as a subspace of K^ ' 00 - 1 , and we may thus employ standard 
measure-theoretic techniques (e.g. theorem 3 pp. 148-149 in 
Shiryaev [18]) to obtain Dp after we define it on the "cylinder 
sets" §. tl ... A (Fi, . ..F k ) := {p : p(x x ) e F\ . ..p(x k ) e F k ) c Q. 
However, given that this is a highly involved and technical 
construction that adds little of substance to the result, we prefer 
to follow a more intuitive approach instead; a more in-depth 
discussion and further details are to be found in appendix [B] 

With all these considerations taken into account, we may 
take the large N limit and write: 

lim -1 log P N (r) = lim -1 (log Z r - log Z) (21) 

n^co _/v n->co isl- 

and, by invoking Varadhan's lemma [19], we obtain: 



lim -^logfV(r) = £ - £i(r) 



or, equivalently: 



where 



P N (r) ~ e 



-Ar 2 (£,(/-)-£ ) 



£o 



inf £[p] 

peX 

£i(r) = inf£[p] 

peX r 



(22) 

(23) 

(24) 
(25) 



In other words, we have reduced the problem of determining 
the asymptotic behavior of fV(r) to finding the minimum of 
the convex functional £ over the two convex domains X and 
X r . To that end, we have: 

Lemma 4 (Convexity of £). Let X c Q. be the set of tame 
probability measures: X = Jp € Q : p > and J p(x) dx — 1 J. 
Then, X is a convex subset of the topological vector space Q 
and £ is (strictly) convex on X. 

Again, we will postpone the proof of this lemma until 
appendix [A] However, an immediate corollary is that there 
exists a unique charge density p which minimizes d24l ) and 
d25l >. To find this unique solution - and the corresponding 
(global) minima £o, £.\{r) - it turns out to be more convenient 
to work over the whole space of tame measures Q. and 
introduce Lagrange multipliers for the two domains X and 
X r . This leads to the Lagrangian functions: 

L [p,v,c] = £[p] 

- v(x)p(x) dx - c p(x) dx - 1 j (26) 

H\[p,v,c,k] = & [p,v,c] 

- k(j™p(x) log( 1 + px) dx - r ) (27) 



from which we obtain £<j and £i(r) by maximizing over the 
dual parameters v (non-negativity constraint), c (normaliza- 
tion) and k (mutual information): 



£o = sup inf Ho[p, v, c] 

v>Q;c P 

£i(r) = sup inf L\[p, v, c,k] 

v>Q;c,k P 



(28) 
(29) 



The convexity of Lq, £1 over p ensures that it suffices to find 
a local minimum p(x) for the corresponding Lagrangian L, 
for fixed v, c, k. Then, any solution of k, c that satisfies the 
constraints of p will be unique [20]. It is also worth pointing 
out that the only difference between £o and £i above is that 
the former can be seen as the maximum over L\[p,v,c,k\ 
keeping k — 0; this relation will come in handy later, because 
it allows us to work with L\ and at the very last step set k = 
to obtain £o. 

We are now left to find a local minimum of L\ and the 
easiest way to do this is by looking at its functional derivative 
w.r.t. p. Indeed, recall that the functional derivative of L\ at 
p e X r is the distribution 6L\ [p, v, c, k] whose action on test 
functions e Q. is given by0 



<&Gi[p],0= - 
dt 



&dp + t<p]. 



(30) 



Note now that the expression L\[p + t<p] is well-defined for 
all p 6 X r , e Q thanks to lemma [3] so that, at least, it 
makes sense to study its behavior as t — > 0. In addition to 
that, our convexity result (lemma |4| simplifies things even 
more because, if 5tL\[p\ = for some p e X r , it immediately 
follows that L\ will be attaining its global minimum at 
Then, maximizing the result with respect to k and c 
simply corresponds to enforcing the normalization and mutual 
information constraints that appear in d26b and ( f27b : 



f 

Jo 



p(x) dx 



1 



p(x) log(l + px) dx 



(31) 



(32) 



Furthermore, we must also maximize with respect to v, in 
order to ensure that p(x) be non-negative in R + . This opti- 
mization constraint can be enforced by observing that v(x) = 
when p(x) > and vice-versa, as we shall see below. 

At any rate, once we manage to find a solution to the above 
optimization problem, we will have: 

Proposition 5 (Uniqueness of Solution). Assume that the tame 
probability measure p satisfies the stationarity condition: 



6L[p]=0 (resp. 6Li[p] = 0) 



(33) 



along with the constraint ( 1571 ) (resp. ( liiD , (1521)). Then, p is the 
unique global minimum point of (1241) (resp. ( 1251 )). 

This proposition stems directly from the convexity of £ and 
will be of considerable help to us in what follows because it 

4 Since f2 is a locally convex space, this is just another guise of the 
Gateaux/Frechet derivatives. 

5 Indeed, note that the function w(f) = &i[p + t(q — p)],t £ [0, 1] is strictly 
convex in [0, 1] for any choice of p and q in X r . Thus, if there were some 
q e X r with H\[q\ < XLj [p], we would have w'(0) = (on account of 1301 ) 
but also w(0) > w(l), a contradiction. 



6 



ensures that any stationary point of £, L\ which satisfies the 
relevant constraints will be the (unique) solution to our original 
minimization problem. 

B. Solving the Integral Equation 

Our task now will be to actually find the solution of < f30b . 
subject to the constraints (fJTJ, (|32] l. The solution for £o in 
(l28l > can then be obtained by relaxing the constraint d32l l and 
setting k — in the final result. To that end, a brief calculation 
(see appendix|C]l for the functional derivative for the functional 
derivative SL\[p\ of ( 1301 ) yields the integral equation: 

2 1 p(x')\og\x- x'\dx = x-(J3-\)\ogx (34) 
Jo 

- c — A:log(l + px) — v(x). 

The role of v(x) in the above equation is to enforce the 
inequality constraint p(x) > for all x > 0. It is well known 
[20] that v(x) > only when the probability density p(x) 
vanishes, while when the probability density is positive, v(x) 
has to be zero. 

Of course, solving the above integral equation is no simple 
task, since the inversion process depends on the support 
supp(p) of the density p{x). As discussed in the previous 
subsection (and with a fair amount of hindsight gained from 
the Coulomb gas analogy), we will be looking for compactly 
supported solutions that are continuous in (0, oo); in other 
words, we will be assuming that supp(/) = [a,b] where 
< a < b < oo. 

There is one important issue that must be mentioned here: 
when the dimensions of the channel matrix attain the critical 
value /3 = 1 , we will see that p exhibits two different behaviors 
depending on the rate r. On one hand, we could have a > 
which, by continuity, introduces the constraint p(a) = 0; on 
the other hand, we could also have solutions with a = (which 
impose no extra constraints because p is assumed continuous 
only on (0, oo)). If the rate r is greater than some critical value 
r c , it turns out that solutions with a > must be rejected 
because they attain negative values. In that case, we are led 
to solutions with a — which have no such problems; the 
converse happens when r < r c , and when r — r c the two 
solutions coincide. 

Having said all that, we may now return to ((34), where we 
have v > if and only if p = 0. Thus, by restricting x to lie in 
the interval [a,b], we may henceforth ignore v(x) altogether. 
Furthermore, to eliminate c for the moment, a differentiation 
of (1341 with respect to x yields: 



27 



C^ldx' 

Ja X-X' 



1 



B-\ 



x + z 



= fix) (35) 



where z = 1/p and D 3 denotes the principal value of the integral 
(it appears because of the absolute value in d34"l i above); 

also, for technical reasons, we will need the domain of / to 
be all of R and we thus set / = outside [a, b]. 

The above equation has a straightforward physical meaning: 
it represents a balance of forces at every location a < x < b, 
because the repulsion from all other charges of the distribution 
located at x' (the LHS expression) is equal to the external 



forces (RHS). For f3 > 1 we intuitively expect that p(x) must 
vanish at x — because in this case the force from the finite 
charge density located at x — (the second term of (|35"l l) 
would be infinite. As a result, we intuitively expect that a > 
for all y0 > 1 ; this expectation will be vindicated shortly. 

Indeed, the solution of this integral equation for general 
f(x) can be obtained using standard methods from the theory 
of integral equations [21], [22]. So as not to interrupt the 
presentation, we will postpone the details until appendix [C] 
and will only give the final result here: 

(36) 



P(x) = 



Ja 



2tt 2 V(x — a)(b — x) 

. _ k Vg+ggj+g Q3-1) V^A 



+ C 



2n V(* — a){b — x) 
where C, C are unknown constants to be determined by the 
condition p(b) = 0. 

As we explain in appendix [C] this formula is valid only 
when the function / is itself Z/-integrable for some r > 1. 
This is always true if j3 — 1 but, as we have already hinted at, 
this case has its own set of subtleties, analysed at length in 
section IIII-CI In particular, we obtain two different solutions 
depending on whether the support of p extends to or not 
(imposing the constraints a — or p(a) = respectively), 
but only one of them is physically admissible (i.e. is a tame 
probability measure lying in the rate-constrained domain X r ). 

On the other hand, this dichotomy ceases to exist when 
B > 1. Indeed, if B > 1 and a = 0, the LHS of @5} is no 
longer integrable. However, the RHS of (TSBT l is L 1+E -integrable 
whenever p is £-tame itself, on account of the properties of the 
finite Hilbert transform [22] (see also appendix 0. We thus 
conclude that any solution to (TSBT l whose support extends to 
cannot be tame and will thus have to be rejected by default. 
As a result, the support of p for /3 > 1 has to be bounded away 
from 0, thus leading to the constraint p(a) = and proving 
our intuitive expectation above. 

So, starting with the general case a,b > 0, we find that 
the constraint of continuity requires that the distribution p(x) 
vanish at the endpoints a, b of its support. The condition p(b) = 
determines the value of C in (T36t resulting in the following 
form for p(x): 

PW = r^fl-^,/^-^^ (37) 



2n V* - a \ (x + z)yb+z x \ b) 
The additional condition p(a) = (when a > 0) results to 



p(x) 



1 y/(b - x)(x - a) I B-l 
x + 



(38) 



2tt x(x + z) \" VoT? / 
with the value of a determined (as a function of b and k) by 
the equation: 



B-l 



I. 



(39) 



V(a + z)(b + z) ^[ab 
Demanding that p be properly normalized as in QTT l, imposes 
the constraint: 



f 



p(x)dx 



a + b-2k-2(B-l) 



zk 



2 V(z + a)(z + b) 



= 1. 
(40) 
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In appendix [D] we show that ( f39l > and (|4Q] i admit a unique 
solution a, b for any given £ and, as a result, proposition [5] 
guarantees the existence of a (necessarily unique) density p(x) 
that minimizes |29l >. 

Now, given the resulting solution p{x) we can readily 
calculate the minimum energy £[p] itself: 

xp(x) dx - (J3 - 1) I /?(x)logxofx 



[a,b] 2 
■b 



ff[a 

- if 

2 J„ 

2 i 



pWpO 7 ) log |x - y| c/y c/x 



1 f* 
-1 J 



xp(x)dx — | p(x) log x c/x 



p(x)log(l + px)dx + - 



(41) 



where in the second line we eliminated the double integral by 
substituting it from d34l i [13]. As for the value of c itself, it 
can be determined by evaluating (l34l l at a fixed value of x, 
say x — a: 

c = a-(J3- l)loga -fclog(l +pa) (42) 

r b 

- 2 I log(x - a)p(x)dx 



Inserting this in d4TT > then yields: 



1 r* 

2 1 

-r 



xp(x)dx - 



p-i r" 

2 J fl 



/?(x)logxofx (43) 



p(x) log(x — a)dx 



+ - (k (r- \og(l +pa)) + a-(/3- l)\og a) 



III. Results 
A. Marcenko-Pastur Distribution 

Now, before actually deriving £ i for general values of p and 
r, it is instructive to calculate the most probable distribution 
of eigenvalues without any mutual information constraints, 
i.e. the well-known Marcenko-Pastur distribution. This can be 
immediately extracted from the analysis in Section IH-BI by 
setting k — 0. Solving for a, b in d39t , ( |40T > gives 



a = (^6- if 
b = (^6+l) 2 



(44) 



and d38l then takes the well-known form (O. Note that when 
P = 1, the lower endpoint vanishes (a = 0) and a square-root 
(integrable) singularity appears in p{x) in ©. We may also 
evaluate the energy functional £o using from d43l by setting 
k = 0. In this way, we get: 



1 f * 

2 1 



xp(x) £fr + I (a - (fi - 1 ) log a) (45) 



a - 1 r r 

—j— J P(x) log xdx- J 



p(x) log xdx— I p(x) log(x — a) dx 



and, after some algebra, we can rewrite the above expression 
in the closed form: 



— + - - log A 

32 2 6 



log(aA) 



(46) 



where A = b - a and the function Q(x,y) is given by [23]: 



Q(x,y) 



^ Jo t + 



y 



dt 



(47) 



= -2 ^JyO~Ty) log 
+ (l+2;y) log 



V^ 1 +3')+ V^ 1 + x ) 



V 1 +y+ 



Vi + x + 



J ( Vi +x - v^) 2 



B. Generalized Marcenko-Pastur Distribution: f3 > 1 

We will now present the results of our analysis, starting 
with the case /3 > 1 . First off, it is worth commenting on (|38| > 
which gives the asymptotic density of eigenvalues constrained 
on the subspace with fixed total rate R = Nr in the large 
Af limit. It is quite remarkable that even away from the 
(most probable) Marcenko-Pastur distribution, the constrained 
eigenvalue distribution "hardens" to a deterministic one. In 
Fig. Q] we plot examples of such eigenvalue distributions. 
Finally, we can readily integrate d32l with p(x) given by d38l ) 
to obtain 



= I Pi* 

Ja 

1 A M ^( C ' +Z a+Z \ 

= logAp+ — — Q — — . — — 

2^(z + a)(z + b) \ A A I 



)log(l + px)dx 
Ak 



(48) 



+ -(i = )q(— .-) 

2\ ^( z + a )(z + b)) \ A A/ 

where Q(x,y) is given in d4Tb . 

Based on the arguments discussed in the previous section, it 
suffices to show that there exists a distribution p(x) in the form 
of d38l satisfying the constraints OTI ). d32b . This corresponds 
to finding values of a, b, and k that satisfy d39l ), d40l and d48l ). 
while at the same time maintaining p(x) > for all x € [a,b]. 
If such a solution exists, according to Theorem |3J it will be 
the unique solution. 

In Appendix |Pl we show that equations d39t and d40b admit 
a unique solution for any k. We therefore only need to show 
that d48| i has a solution in k for any r > 0. To do so we first see 
that for k — > -oo the solution of ([39]), (@0| is a ~ z( y/fi- l) 2 /\k\ 
and b ~ z( y[p+\) 2 l\k\\ then, inserting these solutions into ( |48| ). 
we see that it becomes to leading order r ~ @/\k\. On the other 
hand, for k -> oo (l39l . (|4D1 give a ~ y/k+p -z/2 - 1 and 

~ -\Jk + p - z/2 + 1, resulting to r ~ \ogkp. Thus, the right- 
hand-side of (l48l i takes all values in (0, oo) as A: spans the real 
line (-oo, oo) and hence, due to the continuous nature of (08J, 
we see that there exists a solution k(r) for all r > 0. This 
shows that the corresponding solution p{x; r) is the unique 
minimizing distribution of £ in X r . 



In Fig. [2] we compare this distribution with the correspond- 
ing empirical probability distribution function obtained by 
numerical simulations: the agreement is quite remarkable. 

We may now calculate the value of £1. Starting from ( l43l l. 
plugging in p(x) from (138b . and finally integrating gives us: 

(49) 

\2\ 



Ei = 



A a 8-1 
_ + __ logA __l og(aA) 



+ - 



r - log(l + pa) 
Ak 

2 V(z + a)(z + b) 



z(y/b + z- y/a+z) 
4 V(a + zX& + z) 



where Q(x,y) is given by ( |47] i. 



C. Generalized Marcenko-Pastur Distribution: f3 — 1 

The case /J = 1 deserves special attention. In this case the 
logarithmic repulsion from the 5-function density of eigenval- 
ues at the origin in ( fT3] l and 041 ) is no longer present. Still, 
we may try to solve the problem as in the /3 > 1 case, i.e. by 
trying to find the endpoints a, b of the distribution's support in 
such a way that all constraints be satisfied. In this direction, it 
is straightforward to show that the support and normalization 
conditions ( |39l and OTT ) yield the following support endpoints 
for j8= 1: 

a = ( Vifc + 1 - if - z 

b = ( V* + 1 + if - z. (50) 

As a result, the probality density function p becomes: 

1 y/(b - x)(x - a) 

P(x) = (51) 

In x + z 

The value of the parameter k can be obtained in a unique 
way from the mutual information condition, which now reads: 



r - logp = (k+ l)log(fc + 1) - klogk - 1. 



(52) 



In its turn, this can be used to evaluate the value of the outage 
exponent: 

£, - £o = k ~^-(r - logp) + k - \ -z - (53) 

We can immediately see that the above results can be 
problematic. While they are consistent for large enough k 
(and hence for large enough r) they do not hold for smaller 
k and r. Specifically, for k < k c (z) '■= z + 2 yi, the value of 
the lower limit of the support a becomes negative, which is 
unacceptable by definition. Accordingly, this analysis is only 
valid for k > k c (z) or, equivalently, for r > r c (z), where: 

r c (z) := -logz-l+2(Vi+l) 2 log(Vi+l) (54) 
- Vi( Vz + 2) log( Vi( Vz + 2)) > r erg 

For the opposite case (r < r c ) we can no longer allow a 
to be a free variable. Instead, because p(x) = for x < 0, 
the charge density becomes confined at the boundary x — 0. 



Thus, we need to look for solutions of (l34l with a — 0; in 
that case, the charge density has a square-root singularity at 
x — (instead of vanishing continuously). This is actually 
quite natural since we expect that, for k — (or, equivalently, 
for r — r elg ), the charge distribution should take the form of 
the /? = 1 Marcenko-Pastur density: 

V4 - x 



p{x) 



2n yfx 



Indeed, for general b, k, the distribution becomes: 



p{x) = 



yfb~- 



x + z — 



k^ 



2n(x + z) Vx \ VF+z 
and the normalization condition OTb implies 

1-2 



1 - 



VI 



(55) 



(56) 



(57) 



It can easily be shown that $5% has a unique solution in b 
for all k. It is also rather straightforward to show that for 
k > k c (z) the constant term in the last parenthesis in ( l56l ) 
(namely z — k yfz/ y/b + z) is negative. As a result, this solution 
is not valid for k < k c (z) because the charge density becomes 
negative at some point x > 0. We therefore see that, as a 
function of k, the two solutions above (one with a > and the 
other with a = 0) are mutually exclusive. 

In the last case (a = 0), the mutual information condition 
(l32l implies: 



logp + 2(k + l)log 



Vz+ Vz + b 



- \(^fz~Tb- Vif log (z(z + £)) (58) 

and we can use this expression and (l57| i to obtain b and k. 
It is then easy to show that both r vs k equations above 
have at least one solution in the domain of their validity in k. 
Indeed, for k — > -oo d57l i gives b ~ 4z/\k\, in which case (l58l 
can be approximated by r ~ 1 /\k\. In contrast, in the large k 
regime, ( |58l gives r ~ log kp, just as in the /3 > 1 case and 
again, because of continuity, this guarantees the existence of 
a solution k(r) for every r > 0. 

After solving for b and A: as a function of r and z, £i can be 
readily calculated. Using the fact that £o = 3/2, the exponent 
of the probability distribution P(r) becomes: 



£i-£ = - I r - logp - 2 J ~ log 4 _ Hog 2 



+ ^(^-4)(4z + 3/7+l) 



(59) 



We thus see that £, = £ = 3/2 for k = and = 4. 

It is also worthwhile to analyze the nature of the transition 
at r = r c . Surprisingly, the transition is third-order in the sense 
that the first and the second derivatives of £i(r) at r — r c are 
both continuous. If we denote £i±(r) the exponent for r % r c , 
respectively, and we expand it around r — r c , we find that for 
small r - r c 

£i±(r c + x) - £i(r c ) = a\x + ^-a 2 x 2 + ja 3± x 3 + 0(x 4 ) (60) 
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where 



PDF of Normalized Throughput; p = 2, N = 5, SNR = 100, 10 runs 



a | 

a 3+ 
a 3 _ 



log 

(k c + l)k 

VZ«3 



(61) 
(62) 

(63) 
(64) 



D. Outage Probability 

Due to the factor A^ 2 in the exponent of P n 



exp [-A? 2 (£i(r)-£ )], P(r) falls off rapidly away from its 
peak and it would be useful to include the normalization factor 
in front of the exponent. To do this we need to integrate 
exp [-Af 2 (£i(r) - £n)] over r. Despite the quite complex nature 
of the dependence of £i on r — R/N (which we have made 
explicit), this is a straightforward exercise due to the factor A^ 2 
in the exponent. As we shall see in Section [IV] for r close to 
r erg , £i(r) - £o ~ (r— r elg ) 2 /v elg , where v erg is the variance of 
the mutual information distribution. As a result, we may use 
Watson's lemma [24] (a special case of Varadhan's lemma), 
to evaluate the asymptotic value of the integral: 



Jo 



-N 2 (£,(r)-S. ) 



dr 



f 

Jo 



« z |i-rcrg) z 



As a result, we have 
P(r) 



dr 



3 -JV 2 (£l(r)-£ ) 



y]2lTV t 



(65) 



(66) 



To calculate the outage probability P ut('") = POW < Nr) to 
leading order in N, we first note that for r < r ag , £i(r) is a 
decreasing function of r. Therefore, the leading order behavior 
will be dominated by the value of the exponent at r. Using 
Watson's lemma once again we then obtain the following 
expression for the outage probability: 



PouM = Wn < Nr) 



1 



_ e -A- 2 (£ tW-E q) 

N\E\(r)\ 

C -,V 2 (£,(,)-£()) 



r > r„ 



r < r„ 



(67) 



I N\E\(r)\ 

where £[{r) the derivative of £i with respect to r. To leading 
order in N we see that the exponent of P(r), i.e. -Af 2 (£i(r)-£o) 
coincides with that of P ou ,{r) for r < r as and with the exponent 
of 1 - P out (r) for r > r erg . 

In Fig. |2 we plot the logarithm of the probability density 
function (PDF) P(r) as a function of the throughput r and 
we compare the result with the two other asymptotic forms, 
namely the Gaussian approximation of the mutual information 
[6] and the large-p asymptotic result given by dU [1]. We see 
that our result performs much better at low outage, even at 
large p = 100. Fig. |4] shows more clearly the different trends 
of this curve compared to the other two approximations. 

IV. Analysis of Limiting Cases 

It is worth analyzing the results of the previous section 
in specific limiting cases of the parameter space (p,r,/3). 
That way we make connection with already existing results 



- Coulomb Gas Method 
Gaussian Approximation 

- Large SNR Approximation 
Numerical Simulation 



4.4 



4.6 



Normalized Throughput r=R/N (nats/antenna) 

Fig. 3. Plot of the logarithm of the pdf of the mutual information In IN 
for /? = 2 and comparison to the Gaussian approximation and the large-p 
asymptotic result obtained by Zheng and Tse given by {4) [ 1 ] . The numerical 
result for N = 5 follows closely our result, even at large p = 100. 

PDF of Normalized Throughput; p = 2, SNR = 1 00 




Normalized Throughput r-Ft/N (nats/antenna) 



Fig. 4. Further comparison of our result for the pdf of the mutual information 
In IN with that of the Gaussian approximation and the Zheng - Tse result 



in specific regions, and also describe the behavior of the 
probability density of P(r) in other, unexplored regions, which 
hitherto have defied asymptotic analysis. 

A. Gaussian Region r ~ r erg 

The most relevant limiting case is the Gaussian regime: 
after all, the Gaussian approximation as well as the present 
approach assume that the number of antennas Af is large. The 
difference is that our approach does not focus only in the 
region of N\r-r eTS \ = 0(1), where the Gaussian approximation 
should be valid. To reach that limit, we need to analyze the 
small lc region of d48b . d58l l since, in the limit k — 0, both 
equations reduce to r — r erg (p), where the normalized ergodic 
mutual information r erg is well known to be: [5], [6], [25] 



r erg = logM +/?log 



i+e. 



.(!-„-') 



with: 



M = i|l+p08-l) + ^/(l+pOS-l)) 2 +4pJ 

Expanding ((48), d58l for small k we then get: 
r = r erg + £v erg + 0(k 2 ) 



(68) 



(69) 



(70) 
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where the quantity 



log 



(1 - w) 



(71) 



coincides with the variance of the mutual information distri- 
bution as analyzed in [6], [8]. Expanding £i - £o for small k 
then yields 

Si -£o = ^(r-r erg ) + 0(k 3 ) (72) 

Plugging ( fTOl i into the above equation gives the following 
expression for the exponent of P(r) 



(r - re,) 2 



2v, 



+ 



(('"-'■erg) 3 ) 



(73) 



This is exactly the Gaussian limit of the mutual information 
distribution discussed in various papers in the past. 



B. Large p Approximation: r < r erg 

Next we analyze the behavior of the probability distribution 
of r in the large p limit, while keeping the ratio r/ log p finite 
and less than 10 Since in the large p limit r erg ~ logp, the 
region q < 1 with p » 1 corresponds to k < 0, equations d40| i, 
( l39l will admit the following solutions for a, b: 



4(l- 9 )GB- 9 ) 
4q 



(74) 
(75) 



where q-rj log p and we are assuming that < q < 1 . 

Now, note that the lower end of the spectrum has become 
of order z - 1/p, while the upper limit is still finite, just as 
expected. It is also interesting to calculate the proportion of 
eigenvalues that are in the neighborhood of z = 1/p when 
z — > 0. Indeed, by integrating the probability distribution p(x) 
(l38l from a = 0{z) d74l i to Lz for some (arbitrarily) large L 
we get 



lim limP(x/z < L) = 1 

L-yoo z ^0 



q 



(76) 



Thus, the proportion of "small" eigenvalues is simply 1 - q, 
in agreement with [1]. Plugging d74l . (ITBI l into the equation 
for £i then gives the expected result for the exponent: 



£i-£ ~ logp [(l-q)(J3-q)] 



(77) 



which is exactly the diversity exponent (divided by N 2 ) of [1]. 

Finally, it is worth comparing the two asymptotics discussed 
above. In the previous section the eigenvalue distribution did 
not deviate significantly from the most probable Marcenko- 
Pastur distribution, since k was assumed to be small. In 
contrast, here k is finite, equal to k = 2q - 1 - B and the 
lower end of the distribution of eigenvalues in this subspace 
of fixed r = q logp has to become very small, of order 1/p. 

6 This is the region analyzed in the diversity-multiplexing trade-off [1]. 



C. Large p Approximation: r > r erg 

The regime of large p and fixed q-rj log p < 1 is 
relevant in the analysis of the link-level outage probability. 
However, the opposite regime of q > 1 is also of interest in a 
cellular setting with many multi-antenna users receiving data 
in a TDMA fashion from a single multi-antenna base-stationQ 
In this context, to analyze the system level throughput, it is 
the higher end of the probability distribution of the link-level 
mutual information that is important [7], [26]. Therefore, it 
is worthwhile to calculate the probability distribution of r for 
large p with q > 1. 

Interestingly enough, the behavior here is quite different 
from the q < 1 case. Here k ~ p q ~ l and 

a ~ (y/k+p-lf (78) 
b ~ (y'k + B+lf (79) 



resulting to 



(80) 



independent of B. The resulting probability distribution of r is 



P(r) 



(81) 



We see that when N is not too small, the probability of finding 
In significantly larger than its ergodic value is extremely 
small (in fact, doubly exponentially small in r). This is the 
manifestation of the fact that scheduling the best user in a 
MAC-layer in a multi-antenna setting does not seem to provide 
any clear advantage. 

This result has the following intuitive explanation. For large 
p and r > r erg all eigenvalues of the matrix H' H will be large 
and the only constraint imposed upon them is ( [321 . Thus, we 
may say that all of them are constrained by the condition 
r ~ log(l + pAj) ~ logpAj i.e. Aj ~ e'/p. In this limit, the 
exponent is roughly times the sum of the eigenvalues. 

D. Limit r — > 

The final regime that is interesting to analyze is when r — > 0, 
independently of p. This corresponds to the far left tail of the 
plots in Fig. |4]|3] In this regime the solution of d48l (l58l for 
small r is r ~ B/\k\ for k — > -oo and the corresponding values 
of a, b are: 



resulting in: 



-B\og 



er 



(82) 
(83) 

(84) 



This means that the probability distribution P(r) has a tail of 
the form 

/ \MN 

7 In that case a MAC-layer scheduler would be transmitting to the user with 
the best channel, for example. 
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The above behavior of P(r) for small r is easy to understand: 
for r to be small, we need all matrix elements of the matrix 
H to be small. In fact, since H appear in a quadratic way in 
the mutual information equation (f2]i we need all MN elements 
of H to be less than ~ ^r/p. However, there are 2MN real 
degrees of freedom in the M x N complex matrix H. Hence 
the allowed volume of space scales as ~ (r/p) MN as above. 

It should also be noted that the behavior P(r) ~ p~ MN 
of the mutual information cumulative distribution function 
is precisely what is known as the "full diversity" of error 
probability, i.e., the SNR exponent of error probability for 
fixed but very small rate R while SNR p increases is p~ MN , 
which corresponds to the left extreme point of the Zheng-Tse 
exponent [1]. 

V. Conclusion 

In this paper we have used a large deviation approach, 
first introduced in the context of statistical mechanics [9], 
[13], to calculate the probability distribution of the mutual 
information of MIMO channels in the limit of large antenna 
numbers. In contrast to previous approaches that focused 
only close to the most probable eigenvalue distribution, [6]- 
[8], we also calculate the distribution for rare events in the 
tails of the distribution, corresponding to instances where 
the observed mutual information differs by 0(AO from the 
most probable value of the asymptotic distribution (where 
the Gaussian approximation for the mutual information is 
invalid). We find that the distribution in those tails is markedly 
different from what happens near the mean and our resulting 
probability distribution interpolates seamlessly between the 
Gaussian approximation for rates close to the ergodic mutual 
information and the results of [1] for large signal to noise ratios 
(where the outage probability is given asymptotically by (0]i). 
Our method thus provides an analytic tool to calculate outage 
probabilities at any point in the (R,p,N) parameter space, as 
long as N is not too small. Additionally, this approach also 
provides the probability distribution of eigenvalues constrained 
in the subspace where the mutual information is fixed to R for 
a given signal to noise ratio p. Interestingly, this eigenvalue 
density is of the form of the Marcenko-Pastur distribution with 
square-root singularities. 

Finally, it is worth pointing out one very interesting gen- 
eralization, namely to include the correlations of the channel, 
a problem which is considerably more difficult compared to 
the present one. Some preliminary mathematical tools have 
already been developed in [27], and we will expand on this in 
the future. 

Appendix A 
Properties of tame probability measures 

This appendix is largely devoted to the study of the energy 
functional £: 

£[/?] = Jxp(x)dx - (P - 1) Jp(x) log x dx (foi l 
p(x)p(y)\og \x-y\dxdy 



ff> 



where p e Q is a tame density. As evidenced by definition [2] 
where the concept of tameness was introduced, an extremely 
important part in our analysis will be played by the so-called 
U norm || ■ || r defined by: 



■ if 



\m\ r dx 



Ur 



(86) 



If a function / has finite U norm it is called Z/-integrable and 
the space of such functions constitutes a complete vector space 
(also denoted by IT). The completeness of this space follows 
from Holder's inequality which we state without proof and 
which will be of great use to us [28]: 



WfgWl < WfWMl 



(87) 



whenever the exponents r, s > 1 are conjugate, that is: r + 
s- 1 = 1. 

We will also make heavy use of the convolution / * g 
between two functions / and g: 



(/*s)W 



y)g(y) dy. 



(88) 



If / € L l and g e U, Young's inequality (pp. 240-241 in [28]) 
states that their convolution will be finite for almost every x 
and also that: 



ll/*gllr<ll/lll». 



(89) 



We may now proceed with the proof of lemma [3] regarding 
the domain of £ and its continuity properties: 

Proof of Lemma [?} To show that £ is finite for all tame 
functions p e Q, we will study £[p] term by term. To that 
end, let p : K.+ — > R be tame for some exponent s > 0; that is, 
assume that J\p\ 1+e < oo and that J xp(x)dx < oo. We then 
have: 

. The first term of E[p] is finite by definition. 
. The second term in (TTTt can be written as: 



J" p(x) log x dx 

■X 



< j * \p(x)\ogx\ dx 

Xoo 
\p(x) logx\ 



dx. 



Since logx < x for x > 1, the second integral will be 
bounded from above by J x\p(x)\ dx < oo. As for the first 
integral, set r — 1 + s and s = 1 + - so that r~' + s~ l = 1. 
Now, if ^[o,i] is the indicator function of [0, 1], note that 
/ [f [0,i] logx| s dx = L |logx| s Jx < oo for all s > -1. As 
a result, Holder's inequality yields: 



X' 

Jo 



\p(x)logx\ dx = \\p -xio,i] log Hi 



< \\p\\i + s ■ \\X[0,i] log lli+i/ e < oo (90) 
on account of p being L 1+£ -integrable. 
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. For the last term of £, let D + = {(x, y) e K 2 : y > x } and 
note that: 

JJ P(x)p(y) log \x - y\ dy dx 
- 2 ff \p(x)p(y)log\x-y\\dydx 

Xoo /-*co 
\p(x)\J \ P (y)-\og(y-x)\dydx. (91) 

Now, the innermost integral can be written in the form: 



f 

-r 

Jo 

lpCy)I^Cy-^)rfy+ I 

o Jo 



\p(y)\ ■ I logO -x)\dy 
\p(x + w) ■ log w\ dw 

\p(x + w)K(w)\ dw + I \p(x + w) log w\ dw 



r 



+x+w) log(l +w)\ dw. 

(92) 

where /if(vv) is the kernel: 

log \w\, < w < 1 



0. 



otherwise. 



(93) 



As above, K will be L 5 -integrable for all s > -1 and, in 
particular, for s = 1 + -. Therefore, we will have: 



Jo Jo 



|p(y)|iT(jc-y)rfy£/jc=||bl-(|p|*|in)|| 1 
< \\p\\i +e -\\p*K\\ M/e 

< \\p\\l + e ■ Mil ■ + < oo (94) 



where the penultimate step is an application of Holder's 
estimate and the last one follows from Young's inequality. 
Finally, the second integral of ( f9Zb can be estimated by: 



f 

Jo 



|/9(1 + x + w) log(l + w)\ dw 



: I \p(l + x + w)\wdw 
Jo 



< Cx 



o 



w\p(w)\dw (95) 



for some sufficiently large C > 0. Then, since p is tame 
(i.e. Jw\p(w)\dw < oo), we may integrate d95l l over x to 
finally obtain that £[p] < oo. 
This completes the proof that £[/?] is finite for all tame 
functions p e Q. To show that £ is continuous on all 
subspaces of L 1+E -integrable functions with finite absolute 
mean, it simply suffices to note that all our estimates of £[p] 
are bounded by the L l+e norm of p. ■ 

Remark. If a function is in U for some r > 1 and has finite 
mean, it will necessarily be in L 1 as well; in this way, tame 
measures form a (dense) subspace Q of L (R+) that is similar 
to the union (Je>o L l+S . 

We will now prove Lemma [4] showing that £ is not only 
continuous but also convex over the (convex) domain X of 
tame probability measures. 



Proof of Lemma^} Let p, q € X be two tame probability 
measures and introduce the bilinear pairing: 



(p,q) 



ff 



p(x)q(y)\og\x-y\dxdy (96) 



which is actually well-defined on the whole space Q (as can 
be seen by the proof of lemma [3}. Since the first two terms of 
£ are linear (and hence convex), it will suffice to show that: 

<(1 - t)p + tq,(l- t)p +tq)<{\- t)(p, p) + t(q, q) (97) 

for all t 6 (0, 1). Indeed, if we let <p = p — q e SI, equation (l97| > 
reduces to showing that the pairing (■,■) is an inner product 
on the subspace of densities with zero total charge, i.e. that: 

(4>,<P)>0 (98) 

for any nonzero tame (f> e SI with J(f>(x)dx = J (p(x) - 
q(x))dx = 0. 

From the point of view of electrostatics, this is plain to see: 
after all {<f>, <p) is just the self-energy of the charge density <p. 
More specifically, let us define D + = {(x,y) : x < y} as in the 
proof of lemma [3] Then we will have: 

(<p,(f>) = -2 I 4>(x)<t>(y)\og\x-y\dxdy 
Jd + 

<P(x) I 4>(y)log(x-y)dydx 

o Jo 

> 2 I 4>(x) I <t>(y)(y - x) dy dx (99) 
Jo Jo 

So, if we set <&(x) = JT cf>(y) dy and integrate by parts, we get: 

<p(x) I y(p(y)dydx- I x(p(x)®(x) dx 
o Jo Jo 



$>(y) dy dx 



-fMf 

I 2 (x) dx - <D(oo) J 0>(y)dy 
Jo Jo 



> 



(100) 



/-OO 

since <I>(oo) = J Q cf>(x) dx — = <D(0) on account of <f> having 
zero total charge. ■ 

Appendix B 
Construction of the Coulomb gas model 

In this appendix we will briefly show how the transition 
from discrete to continuous eigenvalue measures discussed 
in Section IH-AI occurs. As in the main text, we will not 
present any formal proof here either. However, we will argue 
that treating the formally discrete distribution of eigenvalues 
appearing in (O, as continuous in the large limit is quite 
reasonable. A more formal method showing the same result 
appears in [13]. The main reasoning, also discussed in the 
main text is that the external confining potentials defined by 
the first two terms in (0 or (13[ are strong enough to overcome 
the logarithmic repulsion between eigenvalues (third term in 
and therefore guarantee that (with high probability) most 
of the eigenvalues will be confined in a finite width region 
near the minimum of the external potential. At the same time, 
this will mean that the eigenvalue density per unit length will 
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be scaling with if is large enough. As a result, this can 
be seen as a high-density limit and therefore the continuous 
approximation for the measure will be valid, at least close to 
configurations whose energy is low enough. 

In the remainder of this section we will motivate the 
transition from the discrete to continuous eigenvalue densities 
and show what kind of terms we expect to see. We start by 
focusing in a finite region of eigenvalues of length D. We then 
divide the integration over Ak in dTOb in L segments of length 
i, such that Li = D. The length of each segment t has to be 
small enough so that the energy (O can be well approximated 
with all eigenvalues eigenvalues within a given segment being 
placed at the endpoint of the segment. At the same time, it has 
to be large enough so that there is a macroscopic (i.e. 0(N)) 
number of eigenvalues inside each segment. In principle, at 
the end of this exercise we need to take the limit t — > as 
well, however we will discuss the subtleties of this limit later 
on. As a result, the integral over DA can be written as: 



( L \ 



/w-n zfn z 



( N \ 



Nl( N 



n m= i n m \ 

2 exp -mj^pim^logipime)) 



A'-l Vwfy = l / m=l \/j fiI -0> 
L ( N \ 



(101) 
(102) 



where n m are the number of A^'s that appear in the rath 
segment, with constraint £ m n m = N. The factorials appearing 
at the RHS of fllOlt are the number of ways the N eigenvalues 
can be re-arranged in L segments. This factor constitutes the 
entropy term and, for large N and n m , we can apply Stirling's 
formula to get the exponent in (1102b (where p{mt) = n m /(N£) 
is the fraction of eigenvalues per unit length appearing in 
segment ra). 

We next look at the form of the energy in © 

E(A) ~ P('nf) (mt - (/? - 1) log ml) (103) 

+ f ^ p{ml)p{m't) log |(m - m')l\ 

mtm' 
m 

The last term captures the repulsing interaction between eigen- 
values in the same segment m. The value of a m represents the 
typical distance between eigenvalues in segment ra in units 
of I and therefore is a number of order unity. We may now 
let I — > 0, which will make the sums converge to integrals 
^Zm — * J dx and p(m{) can be written as a continuous 
function p(x. Representing the sum over all possible states 
(i.e. the product of sums in CQB) ) by / -Dp we can now get 



Z ~ [l)p c-" 2 * 1 ^" dx P(.x) l °SP(x) e N j dxp(x)\o g d(x) 



(104) 



where d(x) = a m £ is the average distance between eigenvalues 
at the position x = ml. One can estimate this average inter- 
eigenvalue distance to be 



d{x) ~ a m i 



1 



Np(x) 



(105) 



This was first proposed by Dyson [9], [13], [16] and was 
shown explicitly more recently in [29]. It is remarkable that 
with this choice of d(x) the O(A0 dependence on p(x) in the 
exponent of ( 1 104b vanishes. This surprising fact is true only for 
complex matrices [16] in which, up to uninteresting constants, 
the leading correction to the N 2 £[p] term in the exponent is 
0(1). 

Appendix C 
Solution of the Variational Equation 

In this appendix, we give a more detailed account of the 
solution of the variational equation: 

SLdp] = 

where Hi is the Lagrangian function of d27l i. To that end, if 
<p e Q. is tame, we get: 

Li[p + t(f>] = £i[p] + f£i[0] 

~ 2t jj (f>(x)p(y)\og\x-y\dydx+0(t 2 ) (106) 

and a simple differentiation at t — yields: 



(6Ldp],<f>)= -r 
dt 



1=0 

-2 



£i[p + f0] 

! JJ 4>(x)p(y) log \x - y\ dy dx 

= j 4>(x) v P[p,x]dx, (107) 
where the expression ^[p, x] is given by: 

W[p, x] = 2 J p(y) log \x-y\dy-x 

+ 08- l)logx+ c+ Hog(l + px) + v(x). (108) 

Thus, for the above expression to vanish identically for all 
<p e £1, we must have ^[p, x] = 0, and this is precisely (|34l : 

2 I p(x)\og \x - x'\ dx' — x - (/J - l)logx 
Jo 

- c -k\og(\ + px)-v{x). (l34b 

Having derived this stationarity equation in terms of p, we 
will devote the rest of this appendix to the expression d35l l that 
is obtained after differentiating d34b above: 



2!P 



f 



p(y) 



dy = 1 



= f{x) 



x — y x x + z 

for all x € [a, b] (cf. section III-BI ). This integral equation is 
known as the airfoil equation and can be studied with the help 
of the finite Hilbert transform [22]: 



7W( 



J-iy-x 



dy. 



(109) 



If r > 1, the T-transform maps U to U but, nevertheless, it 
lacks a unique inverse^ Indeed, the kernel of 7 is spanned by 
the function oj(x) = (1 -x 2 )~i: 7[cu](x) = for all x € (-1, 1). 

8 This is a remarkable difference from the case of the infinite Hilbert 
transform which integrates over all R and which is invertible [22]. 
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Outside this kernel, the solutions <p to the airfoil equation 
T[0] = g with d>,g e U[-l, 1] will satisfy [22]: 



-r gQy) 

— x 2 y — x 



dy + 



vr 



(HO) 



where c is an arbitrary constant that stems from the fact that 
any two solutions of the airfoil equation differ by a multiple 
of U)(x) = (1 - jc 2 )~2 . 

Hence, after rescaling the interval [-1,1] to [a,b], the 
solution of the stationarity equation d35l l will be given by: 

(ill) 



p(x) = 



Ja 



2tt 2 y(x — a)(b — x) 



whenever / is itself L I+e -integrable. So, by substituting f(x) - 
1 - — - -|- from ( f33T > and performing one last integration, 
we obtain the final result ( 1361 , 

It is worthwhile to mention here again how this procedure 
breaks down if we allow the support of p to extend to a = 
for /3 > 1 : in that case, the function / also extends all the way 
to a — and the term — makes it non-integrable. However, 
since the Hilbert transform preserves Z/-integrability for r > 1 
and p is assumed tame (and hence L 1+£ -integrable), equation 
(l35l l would equate an integrable function with a non-integrable 
one, thus yielding a contradiction. Therefore, as we stated in 
section III-BI solutions with a — are physically inadmissible 
when {S > 1 . 

Appendix D 
Proof of uniqueness of solution of d39ll,(l40l 

In order to show that ( |39l l, (l40b admit a unique solution, 
we start by observing that for fixed k,/3 and z, d39l l has 
a unique positive solution a < b. Then, from the implicit 
function theorem, this solution can be captured in terms of 
b by a smooth function a(b) whose derivative can be obtained 
implicitly from (|39l l (and which is negative). With this in mind, 

rb 

the normalization integral g(b) — J „. p(x) dx takes the form: 

a(b) + b 1 
gib) = — 4 — + - \z-k 



■08-D 1 + 



yfa(F)b) 

and this is actually an increasing function of b. Indeed, after 
a somewhat painful calculation, one obtains: 



g'ib) = 



1 



1 + 



08- l)z 



y/a(b) b 3 



b - a(b) 
b + z 



> 



(112) 



However, with a(b) decreasing and bounded below by 0, 
this last equation yields g'(b) > 1/8 for large enough b, 
i.e. lim/^oo g{b) = +°o. So, by continuity, there will be a 
(necessarily) unique b* such that g(b*) = 1. Hence, the pair 
a* = a(b*), b = V will be the unique solution to (|39l l, d40b . 
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